Text Mining Improves Prediction of Protein Functional Sites

نویسندگان

  • Karin M. Verspoor
  • Judith D. Cohn
  • Komandur E. Ravikumar
  • Michael E. Wall
چکیده

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Protein Catalytic Sites in the Biomedical Literature

This paper explores the application of text mining to the problem of detecting protein functional sites in the biomedical literature, and specifically considers the task of identifying catalytic sites in that literature. We provide strong evidence for the need for text mining techniques that address residue-level protein function annotation through an analysis of two corpora in terms of their c...

متن کامل

Prediction of GPCR-G Protein Coupling Specificity Using Features of Sequences and Biological Functions

Understanding the coupling specificity between G protein-coupled receptors (GPCRs) and specific classes of G proteins is important for further elucidation of receptor functions within a cell. Increasing information on GPCR sequences and the G protein family would facilitate prediction of the coupling properties of GPCRs. In this study, we describe a novel approach for predicting the coupling sp...

متن کامل

Prediction of Protein Binding Sites by Combining Several Methods

We have combined four different methods for the prediction of binding sites in proteins. The methods used included: data mining approach by Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. We have combined all these method...

متن کامل

In silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties

Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...

متن کامل

BRENDA in 2017: new perspectives and new tools in BRENDA

The BRENDA enzyme database (www.brenda-enzymes.org) has developed into the main enzyme and enzyme-ligand information system in its 30 years of existence. The information is manually extracted from primary literature and extended by text mining procedures, integration of external data and prediction algorithms. Approximately 3 million data from 83 000 enzymes and 137 000 literature references co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012